Goto

Collaborating Authors

 quantitative result


Rebuttal for " Revisiting the Evaluation of Image Synthesis with GANs " Anonymous Author(s) Affiliation Address email

Neural Information Processing Systems

Our presentation is organized for following reasons: In Section 2.3, we present the228 details of generative models, evaluated datasets, and analysis approaches (including our visualization229 tool, histogram matching attack, and human evaluation). They are independent of each other, thus230 we discuss them in parallel in the main paper. In Section 3.1, we investigate the feature extractors231 by first identifying their attention on visual semantics, followed by investigating their robustness to232 the histogram matching attack. Finally, we filter extractors that define similar representation spaces.233 These studies are gradually deepening, thus they are organized in a progressive manner.


Revisiting the Evaluation of Image Synthesis with GANs

Neural Information Processing Systems

A good metric, which promises a reliable comparison between solutions, is essential for any well-defined task. Unlike most vision tasks that have per-sample groundtruth, image synthesis tasks target generating unseen data and hence are usually evaluated through a distributional distance between one set of real samples and another set of generated samples. This study presents an empirical investigation into the evaluation of synthesis performance, with generative adversarial networks (GANs) as a representative of generative models. In particular, we make indepth analyses of various factors, including how to represent a data point in the representation space, how to calculate a fair distance using selected samples, and how many instances to use from each set. Extensive experiments conducted on multiple datasets and settings reveal several important findings. Firstly, a group of models that include both CNN-based and ViT-based architectures serve as reliable and robust feature extractors for measurement evaluation. Secondly, Centered Kernel Alignment (CKA) provides a better comparison across various extractors and hierarchical layers in one model. Finally, CKA is more sampleefficient and enjoys better agreement with human judgment in characterizing the similarity between two internal data correlations. These findings contribute to the development of a new measurement system, which enables a consistent and reliable re-evaluation of current state-of-the-art generative models. 1



RenderMe-360: A Large Digital Asset Library and Benchmarks Towards High-fidelity Head Avatars Dongwei Pan

Neural Information Processing Systems

Synthesizing high-fidelity head avatars is a central problem for computer vision and graphics. While head avatar synthesis algorithms have advanced rapidly, the best ones still face great obstacles in real-world scenarios.


e4a6222cdb5b34375400904f03d8e6a5-Paper.pdf

Neural Information Processing Systems

Inthiswork,wepropose sampling-argmax, adifferentiable training method that imposes implicit constraints tothe shape of the probability map by minimizing the expectation of the localization error.


IKEA-Manual: SeeingShapeAssemblyStepbyStep

Neural Information Processing Systems

Duetothe long-horizon nature ofthe task, we often heavily rely onvisual manuals that provide step-by-step guidance during the assembly process.





Supplementary Materials for Incomplete Multimodality-Diffused Emotion Recognition

Neural Information Processing Systems

In this supplementary material, we first present the details of the conditional score network in Sec. 2. Sec. 4. Finally, we conduct experiments on Chinese MER dataset CH-SIMS [ I) which is subsequently fixed for the model (i.e., not learnable). Table 1: Hyperparameter settings in IMDer.Hyperparameter CMU-MOSI CMU-MOSEI Optimizer Adam Adam Batch size 32 128 Learning rate 0.001 0.002 σ used in our stochastic differential equation 25 25 Number of iterations for Euler-Maruyama solver 500 500 Shallow Feature Extractor Kernel size for E CH-SIMS contains 2281 refined video segments with fine-grained annotations of modalities. For vision modality, we use MultiComp OpenFace2.0 The experimental results are listed in the Tab. 3. Obviously, our proposed IMDer consistently achieves better results than MMIN or GCNet under random missing protocol.